Closed Bug 1308612 Opened 8 years ago Closed 8 years ago

Enabling HSTS priming making website really slow in PGO builds

Categories

(Core :: DOM: Security, defect, P3)

x86_64
Windows 10
defect

Tracking

()

RESOLVED DUPLICATE of bug 1311807

People

(Reporter: ehoogeveen, Assigned: kmckinley)

Details

(Whiteboard: [domsecurity-backlog1] [hsts-priming])

I noticed recently that the very NSFW website exhentai.org (the sister site to e-hentai.org) has become very slow in Nightly. Loading a gallery on e-hentai is almost instant, but loading that same gallery on exhentai takes forever. The only difference between e-hentai and exhentai is that the latter uses HTTPS where the former uses plain HTTP.

I tried to bisect the regression, but ran into a snag: it seems only PGO builds are affected! Using some local builds, I was finally able to figure out that the cause is bug 1246540; setting security.mixed_content.send_hsts_priming to false makes exhentai go back to normal.

Unfortunately reproducing this without an existing e-hentai account is going to be difficult - just created accounts don't have the privileges needed to access exhentai, and will simply get a 'sad panda' image. If by chance you do have an account, this SFW gallery should make the difference in loading times obvious:

http://g.e-hentai.org/g/963716/471b198d63/
https://exhentai.org/g/963716/471b198d63/

I've been trying to set up an e-hentai account for testing that can access exhentai, but no luck so far.

Considering that this only happens in PGO builds, is it possible that there's a race condition in the code somewhere? I don't know anything about this area of Gecko, but let me know if there are any logs I can gather, or changes I can test. Doing PGO builds locally takes a while, but I *can* reproduce using them.

Steps to reproduce:
1) Have an e-hentai account with access to exhentai
2) Use a win64 PGO build with bug 1246540 fixed (e.g. the 2016-09-29 Nightly)
3) With security.mixed_content.send_hsts_priming set to true (the default), visit the following SFW gallery on both e-hentai and exhentai:
    http://g.e-hentai.org/g/963716/471b198d63/
    https://exhentai.org/g/963716/471b198d63/

Expected result:
The galleries load equally quickly on both websites

Actual result:
The gallery loads very slowly on exhentai, and going to the next page after a page finishes loading doesn't visibly trigger a load.
Flags: needinfo?(kmckinley)
By the way, while the gallery may be SFW, the ads on the website most definitely are not. uBlock Origin with default settings seems to block them just fine.
Kate will know for sure, but I suspect that the issue is that the caching of priming results is getting screwed up by PGO.  Not sure what to do about that.

I'm honestly kind of disinclined to do much, though, since the site owner can fix this on his own by fixing the mixed content issues.
I think it's important to at least understand where things are going off the rails. PGO-only bugs are nasty, and this might be affecting other less-maintained websites as well (though I haven't seen any). I should note that the slowdown here isn't minor: I'm talking about 10+ seconds of no activity, presumably until some timeout triggers and it finally decides to load. It makes the website basically unusable.
I agree with Emanuel, I want to understand why this behavior is happening. Unfortunately, without access, I can't really test, and I don't have a Windows VM on this laptop to test.

After you visit the ehentai site, if you reload the page does the reload happen quickly or slowly?
Do you have another, public gallery site that exhibits this behavior?
Can you please visit the site with a clean profile, open the dev tool Network tab, and provide the traces to me? I need the request/response pairs along with all headers for any HEAD requests you see in the trace.
Flags: needinfo?(kmckinley) → needinfo?(emanuel.hoogeveen)
My internet at home has been out all day today. I'll get to this as soon as it's back.
Damn, I think I missed my window - I can't reproduce the problem anymore; the maintainer must have changed something in the website. I may have to close this bug RESOLVED INCOMPLETE.
Flags: needinfo?(emanuel.hoogeveen)
Oh weird, I can still reproduce on my main profile. I definitely saw the bug with a clean profile before. I closed all pinned tabs, cleared my history and cache and removed all my add-ons and I can still reproduce. I'll try to get you those traces now.

> After you visit the ehentai site, if you reload the page does the reload happen quickly or slowly?

Slowly. The timeout is probably unchanged, though I haven't timed it exactly (it's like 20 seconds).

> Do you have another, public gallery site that exhibits this behavior?

Unfortunately not. The public site doesn't use HTTPS, and I don't see the problem there.

> Can you please visit the site with a clean profile, open the dev tool Network tab, and provide the traces to me? I need the request/response pairs along with all headers for any HEAD requests you see in the trace.

Going to do this now.
I've e-mailed the traces, since I believe they contain private information.
If you visit https://www.deviantart.com, which has a lot of mixed-content loads, do you encounter the same issue? I am finally back in an office with a real internet connection.


Unfortunately, nothing with the traces seems to help. Another thing to try is to pass the environment variable MOZ_LOG=nsHttp:5 to the firefox exe and capture all the output.
Assignee: nobody → kmckinley
Status: NEW → ASSIGNED
Flags: needinfo?(emanuel.hoogeveen)
I don't see the problem on deviantart, no. Sounds like you might have more of a path forward in bug 1306107, so I'll watch that for now.
Flags: needinfo?(emanuel.hoogeveen)
Depends on: 1310955
I believe that https://bugzilla.mozilla.org/show_bug.cgi?id=1306107 inadvertently provided a test case with https://bug1306107.bmoattachments.org/attachment.cgi?id=8796318.

What is happening in that case is that the CSS file is referenced over HTTP, and would be blocked as active content, so a priming request is sent. The priming request times out after 30 seconds, at which point the load is blocked. Since the page relies on onLoad events, those do not fire until after the load is blocked. Timing out like this is pretty easy to do, for example with a misconfigured firewall that simply drops packets instead of sending RST.

Bug 1310955 exacerbates this problem because it ends up sending too many priming requests. However, once that lands, there are a couple of options.

1) Do nothing. 1x / day, a site with this problem will see a 30s delay in loading.
2) Increase cache to a longer time, i.e., 48 hours or 1 week or 1 month, but still short enough that frequent visitors will benefit.
3) Decrease the timeout per-connection. A MITM attacker can already drop all packets going to :443, so this only penalizes users on slow or intermittent connections, not those with an active adversary.

I'm inclined to do #1, but am open to #2. I think #3 reduces any security this provides.
This all sounds plausible, and I'll probably try to let the maintainer know about the problem so he can fix it on their end. What I still don't understand is why this is PGO only - I realize the testcase from bug 1306107 isn't, but why would this one behave differently depending on timing?
Priority: -- → P3
Whiteboard: [domsecurity-backlog1]
Whiteboard: [domsecurity-backlog1] → [domsecurity-backlog1] [hsts-priming]
I have this problem on Exhentai since 51, too. 

But I just found that I can't reproduce it on Nightly.

I can reproduce it on 51.0b1 (32-bit) and 51.0a2 (2016-11-14).

Is it because Nightly is not a "PGO" build? Sorry I have no idea what PGO means.
OK, I can't even produce it on 16-11-10 Nightly PGO build. Can you double check it's related to PGO?
I tried different dates: 2016/7/11, 2016/8/11, 2016/9/10 with PGO, x64 and x86 Nightly builds (with the help of Mozreggression), none of them can reproduce the problem.
Marking as a duplicate of bug 1311807 since much of the discussion takes place in that bug and it also has reproductions that occur on public sites.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Depends on: 1313595
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.